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Speech reception via a packet transmission facility 



(57) Degradations in packetized voice communica- 
tions received by a non-synchronized entity, via a packet 
network, are reduced by adjusting a depth of storage in 
a jitter buffer of the receiving entity. Units of voice sam- 
ples data are stored in the |itter buffer as they are re- 
ceived. Stored units are normally extracted and deliv- 
ered to a processor one at a time at a regular rate for 
the generation of audible speech. From time to time the 



rate of extraction can be accelerated by extracting two 
units while delivering only one. Also the rate of extrac- 
tion can be retarded by not extracting a unit while deliv- 
ering a substitute unit in place of the unit that would nor- 
mally have been extracted. The depth of storage is 
thereby controllable in response to packet reception 
events such that delay is minimized while yet providing 
sufficient delay to smooth variances between reception 
events. 
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Description 
Field of t he Invention 
[0001] 



operated ,„ accordance w,lh an Inlernel Prolocol (IP). '""^Poted via a co,„m„„ica,ic^s lacilily 



10 



Background of the ln»o»ti^n 
[0002] 
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SO 
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has traditionally been restricted to the transmission Toh ^^""^ ^'9"^'" the art of telearaohv 

.e;e...aco._ 

pro„de,or,,ansn,,sslonsc,da,asig„i,s ,„° 

:rzr"""-"--'--=^----cr:;r::rrar— 

a real time communication paths are only provided for a data elm '° ^ '^'^P^one system 
being at least a minimal amount of data waiting to bl t ansm'.eri:!: " h °" ""^ '° ^^P^"^-' "P- '^^e7e 
via ommunication paths momentarily assigned to any one 0^1;;!^ '^'^ "^"^'^ '^^"^-'"^d in bursts 

svs em."if "'^'""^'^ ^''^ transport can be had Via S'? f''' ^^^^^^^-cations on an as needed 

systems Which must assign many communication paths each of wh.^h compared with telephone 

responding one of the many data connections '''"'^ ^" communication path for a cor- 

o; - -c,:;::? ; '^r^XSS there has been a desire to taKe advantage 
Of the most efficient and convenient and widelyT^a ab,e data olm' " ' ^'^"^'^ ^'^^ '-'aphony One 

merner The internet is implemented across various palt ys Zorr; '""'^^^ '^^ 'nown 

(IP) The IP IS convenient as it permits communicaLsTrZ 1 'l^^ '"temet Protocol 

destination having to perform any actions in c^cert nTe lasTtr '° ""^ source and 

using , he IP has become popular. By transmission a d eclp on^f't'e'str ' ^'^ P^^^^' --P"'- 

te^et. inter^net seLefar: ^ L^°^^^ -ata communications se.ices via the In- 

networks are interfaced with the public telephone ^v^pmc K Z ^"""^'"^^""^ ^ith the Internet protocol (IP) Data 
standard analog telephone line A wider bfndS eon^^^^^^^^ T"'' ^^^^^^ --'able a. almost any 

or by directly connecting with a data network for examl '^^ '^''P^°"^^y^'^^°ff«^*"g 'SDN sem 

required in order to transmit digitally encoded speech s^is v'la^ ^^"^ °' compression 

connections have been demonstrated using a personarcomnu^^^a ' """''^ ''"^^ Interne, speech 

software or a combination of specialized hardware and °nTr ? ^ ^'^rophone and a speaker and appropriate 
via the internet is the relatively very low coS^^vlaJv rr.H ' 1°'"'^'" simulating telephS' 

persona, computer with the typical aLndanrproceTsl ^^^^^^^^^ " one has already in'vest S in l 

ony v,a the Internet is no more than the cost orsoSe """^ ^^ded cost of teleph 

roni^''"^^ Perrr.it P^users; talk "^'^T'"" '^^ '^^^^^-^^ COO Salk 

[0007] In operation samples are taken svnrhrnn , ^ ^'^ "'^ '"'emet. 

alog voice signal and are Processed o gen' a,e :d:dsp::c^^^^^ 7'" '''' ^ --'^ --ophone an- 

sTnT r ^'^'^^"^^ °' ^ ^^'^^^ -cLs^^d by colcTg he?od?d"'' ^^^^^ nature, is substamia jy 
extent dependent upon other functions being execu^edTthrPr S ' 

these are .ransmi.ted as a packet. Each packet includS L °' '^^'^^s have been collected 

indicating the time of transmission and thet e o mo.e f^mes o^' °' ' h !?''"'^' ^ ^ ""^^ '-^ ta^P 

rom the perceived voice quality of audible speech reoenerrH ^^^^'^^^^ ^^tract 

[0008 One problem is that voice is time Seprndem ^nd sarr^rd"'"'' '"""^ '^^ 'r..eme. 

packe systems are asynchronous in nature. In accordance w'^T^^^^^^ ^^"^'^^^"^^^ - "^-e. while 

without guaranteeas to thetimeofarrivalat the .ntende=-S~^ 
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the destination is more or less irregular. Furthermore the order in which the packets arrive can be irregular. I rregularilies 
!ntr<iuced S transport via the IP must be compensated .or at the destination otherwise regenerated speech may be 
hrni<pn and of diminished intelligibility at the destination. 

Another problem is that is that voice Is time dependent and sampled voice signals are synchronous ,n na ure^ 

5 while he operations of the source and destination PCs is independent one from the other. The clock .n one PC .s not 
Synchronized with the clock in the other PC. The rate at which the source PC generates encoded samples ,s never 
exactly the same as the rate at which the destination PC processes the received samples. Between any two PCs used 
for a telephone like conversation there is often a mismatch of more than several parts per thousand. Consequently 
during a conversation the faster PC tends toward operation in an under flow situation while the slower PC tends toward 

10 operation in an overflow condition. The under flow condition results in audible breaks in regenerated speech and is 
compounded by the irregularities introduced by transport via the IP The overflow condition rnay compensated by 
an ever expanding queue in the PC but this introduces ever increasing delay in the regenerat.on of the speech at the 
destination PC A delays of several seconds can accumulate dunng a conversation. 

?00 0] Recently, direct voice access into telephone networks for Internet users has become a commercial reality 
rs Th°s provides a service wherein the PC user may converse directly with a telephone user. The attraction for business 
In erpL is a new form of communication with a class of customers, Internet users, though, to be more co-merciaHy 
oriented or .o have more disposable income than the average individual. In one example a private branch exchange 
m.ertaced with an IP network via a voice gateway. The voice gateway is connected va a trunk, line or an IP 
dafa nk t transmit and receive packets and is connected via several PBX lines, or a PBX TDf.^ loop to transmit and 
20 Seiie voice sinals in the operating protocol of the PBX. A perceived problem in this proposal lies in the realization 
h? hrouqb freS exposure PC users generally become tolerant of degradation and delay in the reproduc on of 
speec while o^^^ other hand a telephone users unaccustomed to conversing via the IP are less toleran.Jhe 
'elephonruser may interpret conversational delays as a lack of candor or honesty on the part of the other par^y to the 
conveTsaUon such does not bode well for a business enterprise. An other problem may arises in that a telephone 
2S user accustomed to typical telephone voice quality, may react to breaks in the conversation as signifying an equipment 
malf unction such does not bode well for either the PBX manufacturer or the PBX sen/ice provider as they may each 
suffer increased complaints and depreciation of the goodwill associated with their trademarks. 
0011] When processing packetized voice signal data which is delivered over a non guaranteed qual.y of sendee 
UansportVacllity such as an Internet, there are two primary facts that contribute to a degradation of audible speech 
30 jeproduction. ^^^^ ^^^^^^^ ^^^.^.^^ ^^^^.^^^^ ^^^^ ^^^.^ T^'^'T^'l 

ndep ndent ol the other at its own independent clock rate. A clock which governs a rate of -"--P^'- ^^^^^^ 
data at a receiving entity is not synchronized to a clock which governs a rate of production of the speech data at a 
uansm toTery The rates of production and consumption are not exactly the same. This leads to a degradation in 
he q'a y of the'speech being audibly reproduced. Over a period of time this non-synchronized operation has either 
one o. two consequences at the receiving end. When the receiving end clock is too fast the rate o, -nsump .on 
voice data is too fast. The receiving end is starved for data resulting ,n momentary breaks in speech reproduction This 
s ome mes referred to as an under flow When the receiving end clock is too slow the ^^'^ °' °' 
data Ts too slow The receiving end has insufficient memory to store data such that parts of the data are lost at the 
iceilg end and not all the speech is audible. This is sometimes referred to as an overflow^This ^^'ay increases 
throTghout the duration of the conversation and ,n the extreme has been observed to exceed 5 seconds dunng a 1 5 

TooT^ 'T'anlTective of the invention is to reduce the effects of non-synchronized operation and thereby improve 
he quality of the perceived speech being audibly reproduced from voice signal data transported via an IP or the like^ 
001 4? Although as before r^entioned, the packets are transmitted more or less regularly the second P-^- 
rom unpredictable delays in the transport of individual packets through the data network operated in accordance with 
e7 Duet data network traffic variations, transport time of the packets from the source to the ^f^--^;^^;;^^^^^ 
During an Internet telephone call the IP's dynamically unique data delivery characteristics such ^^/^^^P^^ f '^V^ 
var ances in transport delay referred to as jitter, and possible toss of packets. The delivery characteristics change, 
more or e s throughout the course of a single call. Severe jitter may result in an occasional ^^^^^^^^ 
which two packets should be delivered to the receiving entity. A jitter buffer at the receiving enUty, mitigates the jU^ter 
by add ng yet more delay. Frames from the incoming packets are stored in the jitter buffer, order being maintained wi^h 
referer^ce fo he time stamps. Upon the initiation of a call, consumption is delayed until the jitter buffer exceeds sorne 
predrerm^^ of dullness whereafter received frames are made available for processing at a regular rate^ 

Hence speech is audibly reproduced via the loudspeaker. Ideally as the rate of delivery fluctuates the fullness of he 
mfer bu fer f uctuates in a corresponding manner, while frames are withdrawn at a regular rate as determined by the 
c ock in he receiving entity for processing. If however the fluctuations are more extreme than expected momentary 
under now and or overflow occurrences will be manifest as speech degradation. This can be mitigated by providing 
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(he par, ol o„a o, bo,h o, me parties 10 Ihe conversaToI; °' ^P"'^"^"/ and cando, on 

Lai„::,;rnrro:ro?er::rorr.trr r ^"»- 

overflow ,s accompanied by very no iclwe ZraJl^™^^^ Progressively less e«ec,ive An evenlual under now or 
or speech delelions. degradation ot the speech reproduction in the lornis o( unusual pauses 

10 

Summary of the Invention 

'S havin, pcen tra„s™t,ed tro. a 'rans.is'sion^rar ^fSl^.TCrpr^ 

a, rece,v,ng packets o, the un„s o, voice data samples, and one atte, an Cher storing the units: 
ZZt:^ZT" ^" =• " --'-""9 «>. audible reproduction o, 

nurb:rp™LlVaTa:g:i^la~" '"^ °' ----ry and a target 

diiterro^t rjrdrnrratedrrzrarXt^^^ "--^ — ---v . .... . 

- delay ,„ delivery ,s correspondingly reduc'^ b ^re . , ■ reasTng ZTJlZfr.^' = 

35 Clocking pulses, comprises: reproducing speech signals ,n response to locally generated 

a butter means ,o, receiv.ng packets o, the un„s o, votce data samples, and one atter another storing the units 
generated: and 9 = a' a regular rate lo the processing means, whereby speech signals are audibly 

= i™:;'nrb?i'e:rs:^t,:gTt^a;7r,a;"tr^^ = °' - - 

rC^dC^ '^^ an -rfg'era;^: o:^:^^ ^ri's^r t^S fh': 

nfwr irytrrnTrc^r^e^r^^^^ rp~' = ^ --^ ^ 

Len'us^^rp^rrd^s^Ts^pSThiraTkrn^tn'^^ 

telephone system comprises: '^'"'"^ operating ,n accordance with an IP The 

a buffer for storing frames of voice data from packets transported via an IP network: 

r:::^^^:^:;: "° ^--^'^ ^ -dard oper^^g 
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a circuit switch for coupling a one of the user's telephones to receive the voice signals: 
a substitute register for storing at least one substitute frame of voice data: and 

a gate means responsive to a demand from the frame processor for performing any one of the following steps: 

i) extracting two frames of voice data from the buffer while delivering only one of said two frames to the frame 
processor and thereby reduce the frames stored in the buffer, 

,i) extracting only one frame of voice data from the buffer and delivering said one frame to the frame processor 



iii) delivering a copy of the substitute frame of voice data in the substitute register to the frame processor and 
thereby increase the frames stored in the buffer: 

whereby in operation a time interval, between storing a transported frame in the buffer and coupling the corre- 
sponding voice signals to the telephone, is controlled by the telephone system. 

Brief Description of the Drawings 

[0022] Example embodiments of the invention are discussed with reference to the accompanying drawings in which: 

Figure 1 is a block diagram which broadly illustrates a typical network wherein speech signals are transferred via 
the Internet protocol between a PC and an other PC or a telephone set: 

Figure 2 is a diagram which broadly illustrates examples of voice data signals and packets as these progress 
through the network in figure 1: 

Figure 3 is a block schematic diagram illustrating an example of a gateway circuit shown in figure 1 in accordance 
30 with the invention: 

FiQure 4 IS a block schematic diagram broadly illustrating an example of a PC. shown ,n figure 1 for among other 
functions audlNy reproducing speech from packetized speech data received via the Internet, in accordance with 
the invention: and 

Figure 5 is a flow diagram which illustrates a sequence of functions by which either of the PC ,n figure 3 or the 
gateway in figure 4 is operable for in accordance wtth the invention. 



Description 



r00231 By way of introduction, a known arrangement for preparation and transmission of voice data via a network 
'nfo^a'on as disclosed by J. C. Lynch e. al in United States paten, No. 5.649,005 is preferred. 
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analog vo.ce s,gna. generated -nte .7^^^^^^^^^^^ - every 1 25m seconds, fro. ,he use.s 

signal sarrnples. In f.gure 2 the encoded microphone s^qnarsaZi. '° "^^^^^^ encoded microphone 

having thirteen binary bits per sample. As time passes a n ural o, ' ^^^^^P'^^'^d by a linear encoded sample 25 
samples as shown at 26. Usually depending upon .h'e so twa e a 'oTcTr k'' ^ '^^^ or'unit of 

samples are collected into a frame or unit of vie darrPn^Jo ^^^'T ''""^ '^^tween about 80 and 320 

When enough samples are gathered .he PC 2" om resse °' ^ 

accordance with a speech compression algorithm sorarbandwid hT' 1 .° ' °' '^^"^^ size. ,n 

ne 11 .ay be met. If however a broader bLdwil coupl ng toT T2T^a '^'^P^°- 
12 there IS no need for such compression. One or mo^framls are ' 7 J " ^^^^P'^ "^^ ''"k 

between header and trailer portions to form a packet 2^ Each he J^ J ' ^^^'""^ ^^"^'^ packetized 

a so called time stamp, indicating the time of transmisL as weTar.T'^'^ °' '"'^"^^^ ^^^^iver. 

he payload ,s indicated as being data of a periodic o gin 'haU encod^ example 
ransmitted more or less regularly but may have to wa,? for o^e TZ Z '"^'^'"^ "^'^ ^^^"^^'^ ^^-ally 
transported through the IP network 10 and eveniu^iTJlT^^J > traversing the IP network. The packet 27 is 
.llustra.ion ,s assumed ,0 be the gateway circurirTS^B oTl ' destination, which for the purpose of 

by processing each received frame ,n acco Jant with an expansion I'o Ih °' compressed frames 

of voice data, at a rate determined by the PBX ThisTs seldZ ^ ^""'^'^' '° '"'''^"''^''^ regenerate t^ 
--P'ec'-Theexpans.onalgonthm.ssubstan.ajac^mplem^^^^^^^^ "^'^-^ '^^y were originally 

^ the expansion of the digital signal sample involves "a's ron o a^elP 7'"'^^^^^^ howevenn this example 

standard. The PBX 15 has assigned a commun.cat ^ns ^a h vL T ^^'^^ '=°^e modulation (PCM) 

18 and 19 and the gatewav circmt ^A ir.. H„,;.r: T! Xr. ot the links 0-n between one of the teleDhonp<; 
[0025, As before discussed it ,s the effe^ o ZT;;:." !r^ !,f!^^! '^"^ --"'^^^ '^e user's telephone set 
time to time introduces noticeable degradation in tL reproduction If' XT"' °' ""'^"^^'^'^hronization which from 
signals The gateway circuit, as exemplified in F^gire rScefthe . f '''''' ^'^'^"^^^ ^P^^ch data 
telephone data link 1 3 are converted to binary signal form and presen.^H^ °' '^"'^'"'^ 

52 includes frame storage locations (a-n) and is drrenTearh n ? . ^'^ ^ "9"^' ^^'^ buffer 52. The buffer 

Of encoded speech data for subsequln. extract on Dependino^^^^^^^^^^ ^^^"^ '^^ --P--ed frames 

of each packet ,s either stored ,n serial order ,n which the rckets a^^^^ °' ''^ '""^^ the payload 

in an order in accordance with the packet's associaLd Jme stamr^^^^^^^ ^ ^^^-d 

frame processor 58. The frame processor may be prov^deTbv a Zr^ir h h °" '^""^ "^"^^ is driven by a 
purpose microprocessor, being suitably progLr^^ fhe ratroff^n^^ f P^°'^^^°r 
that the frame processor to deliver telephone standard IntlT " ^^^ermined by the requiVement 

0-n. Hence in normal operation the frames are exacted romThfbt^ ^^'^ '° °- °' '^^ links 

governed by the rate of regeneration of the telephone standard vol^ i , ^' ^ ^^9"'^^ ^^'^ being 

52 via a signal path 53, by a gate 54, and nor^a^rde fvered ,0 the f . ' " "'^'^"'^'^^ 'r°- ''^e buffed 

controller 60 is connected ,0 the gate 54 by a comroTpath 62 su-^^^^ ' P"'' ^ ^^'^ 

altered and thereby effectively alter the rate of exT^tn The s ate of f'' °' '^^ gate 54 may be 

arrival times of packets are monitored via a pathTl biThe qaLTon.ro r«n' °' °' ^^"^r 52 and the 

within the of the buffer 52 Is optimized The gate co^troLr^o Accordingly the number of frames stored 

decreasing the regular rate and reduces theCber of ames sS^d^ °' '""^^ effectively 

example ,n order .0 increase the regular rate ^e a^l conTro le ' 1' '"'^'^''"^ '"^ ^^^^'^^ ^^'e. For 

controlling the gate 54 to extract two frames instead oTone Jn^ n^ ^oce\eme the effective rate of extraction by 
58, via the signal path 57. while discarding theX ofThe two fra^^^^^^^ '""^^ '--^ processor 

the regular rate, the gate controller may dLelerate the r/ecle ^^0, , ' ^^^^^r to decrease 

a substitute frame from a substitute register 56 via a signal patl^S^^ T °" gate 54 to extract 

58 instead of extracting a frame from the buffer 52 Te ubs'fute ^ramf ^^""^ ^° '^^^^ processor 

voice samples. In one example it is preferred that the framt J ^eprBsenX any predetermined series of 

following an ext^cted frame of substantially '.rnrvi^Vrarpir^'^^"" ^"^"^^ """^^ ^^'"^ -^="-ed 

acontextLJvingfperatl^rraC^^^^^ 

at he heart of the PC Recently PC assemblers have been SndTo to ZT . Pr°^e«^'"9 (CPU) 31 is 

by Intel or l^otorola. for this purpose. The CPU 31 's coupled vt a ITo T ' "^'^r°Pr°^essors manufactured 

41 having stored therein an operating system 42 a soeelh Zi , I '° ^ ^^"''"'^ ^^'^ess memory (RAM) 

buffer function operated by the speech applation ^'^PP'"^^ ^ ^^sen/ed buffer space 44 for and a jitte - 

■ncludes a jitter buffer management -nstruc^n et 45 'o^a^^ ''^ ^"^^^^ ^^P"— '^^ RAM 4 

i-tter buffer without effectively altering the rate of dLte ^oTuni ts of vo h °' ""'^^ °^ "^'^ 'r°- 'he 

frames of voice data are received from the te-ephont" ;^TbT:r :n^^35nhirdem ^^^^^^^ 
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reoresentation of a received packet to a peripheral bus 34 from whence it is transferred via a bus 22 to the CPU 31 
under the control of an input output interlace unit 36. The CPU 31 responds to the received packet in accordance with 
the speech application to store the frames in the buffer space, taking notice of the time stamp information having been 
received in the packet. On a regular basis as determined by a software clock the CPU will normally extract a frame 
5 from the buffer space 44 and generate therefrom a series of voice data samples, for example similar to that illustrated 
at 25 in figure 2 These voice data samples are transferred to a sound card 37 via the buses 32 and 34 under the 
control of the input output interface unit 36. The sound card responds to the voice data samples by audibly reproducing 
the speech they represent at the loudspeaker 22. 

r0027] As before discussed it is the effect of packet data transport as well as the effect of non-synchronization which 
w from time to time introduces noticeable degradation in the reproduction of audible speech from packet.zed speech data 
• signals The flow diagram in figure 5 is one example of a method for controlling a flow of units of voice sample data, 
preparatory to audibly reproducing speech signals from packetized units of the voice data samples having been trans- 
mitted from a transmission source via a packet communications facility. The pnnciple of operation illustrated by the 
flow diagram is applicable relation to either of figures 3 and 4, however for convenience of description are referenced 
15 to figure 4 Illustrated functions of receive a transported packet 71 and subsequently store a payload 74 are part of the 
speech application 43 in the RAM 41. The normal function of the speech application 43 is modified in accordance with 
the flow diagram by the jitter buffer management instruction set 45. At the beginning of an Internet voice call the buffer 
space 44 is empty When a packet is received it is checked by an interrogation function 72. If the payload is speech 
sample data, the time stamp is compared by interrogation function 73, with any previously received time stamp for 
20 which there yet remains a frame in the jitter buffer. If there is a frame of earlier origin or no frame, the frame or frames 
of the payload are stored by the function 74. If among any stored frames there is none of earlier origin, the packet is 
deemed to have been received too late and it is discarded while a late count is incremented, as shown at function block 
76 Function block 77 requires generation a root mean squared (RMS) value on the basis of differences of time between 
subsequent payload storage events. The RMS value and the number of late counts are used as the reception and 
25 storage payloads continues tocalculate a desired target for a number of frames stored in the jitter buffer. This is referred 
to as buffer depth An average of the buffer depth is determined in function block 75 which keeps a running tally of the 
frame or frames stored dunng each payload storage event. As will be discussed later the running tally is used in 
determining an average buffer depth. , « 

r0028] The functions described in the preceding paragraph provide for the storage of payloads to the exclusion of 
30 late payloads and for data based on these events. The occurrences of these events are dependent upon packet origin 
• at a transmitting entity and packet transport via the IP network In contrast the functions described in the following 
paragraph are dependent upon a local clock in the receiving entity, which for all practical purposes is unsynchronized 
with respect to the transmitting entity , ,u ^r* 

[00291 A speech frame clock rate is dependent upon the rate of utilization of individual encoded samples in the sound 
35 card 27 or upon the rate of an assigned TDM channel occurrence in the PBX 1 5. An occurrence of a speech frame 
clock indicated at 81 is a demand that a frame of speech sample data be delivered to a frame processor, for example 
as implemented in the CPU 31 by the speech processor application 43. The majority of speech frame clock occurrences 
will result in a single frame being extracted from the jitter buff er and being delivered to the speech processor application^ 
A speech frame clock occurrence is detected at 82 and results in the jitter buffer being checked for the presence of at 
40 least one frame as shown at 83. If the jitter buffer is empty a substitute frame is delivered as required by a function 
block 85 If there is one frame or more in the jitter buffer, the results of the functions 78 and 75 are compared at 
. interrogation function 84. If the average buffer depth is short of the target by less than half a frame, a substitute frame 
IS delivered as required by the function block 85. On the other hand if the buffer depth is not short of the target by less 
than half a frame the results of the functions 78 and,75 are compared at interrogation function 86. Here if the average 
45 buffer depth exceeds the target by more than half a frame, function 87 extracts the next two frames from the |i«er buffer 
and delivers only one of the extracted frames to the frame processor The remaining extracted frame is discarded. If 
the average buffer depth is within half a frame of the target, it is deemed to be satisfactory and a singe frame is extracted 
from the jitter butter and delivered to the frame processor 

[00301 By managing the jitter buffer depth as hereinbefore disclosed, delay is dynamically adjusted toward an optimal 
50 minimum while being balanced against, the requirement of reduced occurrences of frame losses and substitutions. 
Those frame irregularities that do that do occur tend to be distributed and hence lesser degradation of speech repro- 

duction is perceived. . 
[0031] In one example, the value of the substitute frame in function 85 is chosen to be one of a silent speech frame 
and an interpolation frame. The choice at any one instant it dependent upon the preceding frame having been sub- 
55 stantially representative of an absents of voiced sounds or a presence of voiced sounds. 

[00321 in an other example the substitute frame in function 85 is chosen to be a silent speech frame with its delivery 
being held in abeyance until one silent frame extraction and delivery has occurred, or until several contiguous silent 
frame extractions and deliveries have occurred. This has the advantage of adding to the buffer depth without introducing 
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any irregularity into the generated audible speech 

[0033] A realization of the desired target ,n function 78 has been calculated as follows: " 
depth = RMS jitter from function 77 multiplied by a constant A 
If discarded packets in function 76 is greater than a constant B% 
then Target = the depth multiplied by a constant C . 
If discarded packets in function 76 is less than a constant D% 
then Target = the depth multiplied by a constant K 

Where each of the constants A, B, C and K are which are optimized experimentally 
[0034] It is envisaged that some further improvement can be reali7erihvnm,.i^ 

samples dala a,s slored in Ihe Wer bSasSr,rI "'"'"^""J" " °' m» 'Reiving enlily Unas ol voice 

apiocessoionealallmeeVrte^^irricTire^^^rcrofa^ S 

can be acceleraled by ex„ac,lng ,wo un„s while deliyenng only onV AlSTe ,a«XT™ ? I 
exiracling a unil while dellverina a subslllula una in nto,; „ TJ^ ■, I extraction can be retarded by not 

depth Ol storage is thereby controteble rJionU ,o ' al^i T ""^led The 

provldinc, SKltlclent delay I smooth v^rirnces'betw^^^^^^^^^^^ '^'^ V^' 



30 



35 



40 



45 



50 



55 



BNSDOCID: <EP 0921666A2J_> 



8 



EP 0 921 666 A2 



Appendix A: Jitter Queue Pseudo Code 
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Data Str\n:iurcs and Consiaiiis 

Structure FifoDataType{ 

int TirneStamp; 

enum ControiCode ; 

long Data(161; 
Structure FifoType { 

FifoDataType FiloData[FIFOSIZE] 

int HeadPtr; 

int HeadTinie; 

int MaxTime; 

int Length; 

int JitterTime; 

float Tau; 

float Error; 

float Accum; 

} 

Structure { 

int Total; 
int Late; 
int Early; 
int Reorder; 
int Duplicate; 
) PegCount; 



//Time in 8Khz Samples 

//Time in 8Khz Samples 
//Time in 8Khz Samples 
// Filter time constant 



30 



RTV Packet IVoccss 
Initialize; 

Init all frames in Jitter Queue = FRAMEERASURE; 
Clear JitlerQReady Flag; 

Repeat 

Wait for notification of packet arrival; 

40 

Get Packet; 

PutFrames(Packet, FIRST); 
Until RTPTimeStamp > Jiner Buffer Time; 
Set JitterQReady Flag; 

45 

Repeat 

Wait for notification of packet arrival; 
Get Packet; 

PutFrames(Packet, INORDER); 
50 Until audio channel Shut down; 
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iTaiiR's 



PutFrames(Packet. Mode) 
int WriteTime; 

if Mode = FIRST { 

initialize 

} else { 

Start Critical Section; 

// Copy frames from RTF packet to Fifo buffer 

"/I Ih^Va^r^^ T^^' 1?''^''°" depending on the RTF timestamp value 

// Thts effectively reorders any out of order packets 

// If packets are duplicates the most recent is discarded 

for 1 = 0, Number of Frames in Packet - 1 

WriteTime = RTPTimeStamp + I • SamplesPerFrame(Codec]- 

PegCount.Total++; 

if WriteTime < Fifo.HeadTlme { 

PegCount.Late++; 

return; 

return; 

InputPo/nter . ((^JPW^^^^^^ . F.fo.HeadTime) / Samp.esPerFrame[Code, 

if Fifo.FifoData[lnputPtr].ComrolCode FRAMEERASURE t 
PegCouat.Duplicate-r+; 
return; 



if frame = SILENTFRAME { 

Copy silence frame to Fifo.FifoData[lnputPtr] Dafafl- 
Fifo.FifoData[lnputPtr].TimeStamp = Wr.teTime- 
Fifo.FifoData[lnputPtr].ControlCode = SILENCE 
if WriteTime > Fifo.MaxTime { 

Fifo.MaxTime = WriteTime; 
Fifo. Length = Fifo.HeadTime - WriteTime- 
} else PegCount.Reorder++; 

} else ( 

Copy trame to Fifo.FifoDataflnputPtr) DataO 
Fifo.FifoData[fnputPlr].TimeStamp = WnteTime- 
Fifo.FifoData[lnputPlr].ControlCode = VALIDDATA 
if WriteTime > Fifo.MaxTime { 

Fifo.MaxTime = WriteTime; 
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Fifo.Longlh = Filo.HeadTime - WriteTime: 
) else PegCount.Reorder++; 

} 

5 end for; 

End Critical Section; 
return; 

10 
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I i ramc Process 

GetFrame(Frame) 

Start Critical Section; 

// This code does not do synchronization 
Copy Fifo.FifoDafa[F.fo.HeadPtr].ControlCode to Codec 
Copy Fifo.FifoData{Fifo.HeadPtr].Datan to Codec 
F(fo.FifoData[Fifo.HeadPtr].ControlCode = FRAMEERASURE; 

Fifo.HeadPtr = (Fifo.HeadPtr + 1) % FIFOSIZE" 
Fifo.HeadTime Fifo.HeadTime + SamplesPerFramefCodec]; 

// Phase Lock 

// The phase lock algonthm attempts to dnve the error term to 
// a constant value (i.e. analogous to the VCO voltage) 

which .8 equivalent to having the average length of the jitter Q 
// be the initial Jitter time. 

// Accum accumulates the error at each frame. When Accum > Frame 

a frame ,s discarded. Wher. Accurr. Frarr^e a frame is inserted 
// Positive error means the jitter Q is filling up 
// Negative error means the jitter Q is emptying. 

if Fifo.Accum < -SamplesPerFrame[Codec] / 2 { 

// The Q is emptying so insert a Frame Erasure frame 
Copy FRAMEERASURE ControlCode to Codec 
r^elurn^''''"' " F*<o.Accum + SamplesPerFrame[Codec]; 

} 

if Accum > SamplesPerFrame[Codec] /2 ( 

// The Q is growing so discard a frame 

Fifo.FifoData[Fifo.HeadPtr].ControlCode = FRAtVIEERASURE- 
Fifo.HeadPtr = (Fifo.HeadPtr + 1 ) % FIFOSIZE' 
Fifo.HeadTime = Fifo.HeadTime ^ SamplesPerFrame[Codecl- 
F. o.Length = F.foLength - SampfesPerFramefCodecl- 
Fifo.Accum = Fifo.Accum - SamplesPerFramelCodec]- 



Copy Fifo.FifoData[Fifo.HeadPtr].ControlCode to Codec 
Copy Fifo.FifoData[Fifo.HeadPtr].DataO to Codec 
F,fo.FifoData[F,fo.HeadPtr).ControlCode = FRAMEERASURE- 
Fifo.HeadPtr ^ (Fifo.HeadPtr + 1 ) % FIFOSIZE" 
Fifo.HeadTime = Fifo.HeadTime SamplesPer'Frame[Codecl- 
Fifo.Length = F.foLength - SamplesPerFrame[Codec]- 



End Critical Section; 
return; 



35 



40 



EP 0 921 666 A2 

Claims 



1 A method for controlling a flow of units of voice sample data, preparatory to audibly reproducing speech signals 
■ from packetized units of the voice data samples having been received in packet payloads via a communications 
5 facility from an unsynchronised source, the method for controlling the flow of units comprising the steps of: 

a) receiving packets of the units of voice data samples and one after an other storing the units of voice data 
samples: 

b) delivering the units stored in step a) one after an other, at a regular rate, for audible reproduction of said 
speech signals: 

c) from lime to time determining a difference between a target number and the number of stored units awaiting 
delivery: and 

d) altering said regular rate to change an average number of the stored units awaiting delivery whereby an 
average delay in delivery of the units stored in step a) is altered toward a target delay 

2. The method according to claim 1 wherein the packets received ,n step a) each includes a time stamp and the units 
20 of voice data samples are delivered in step b) in chronological order. 

3 A method according to claim 2 further comprising the step of, discarding any received packet having a time stamp 
which is earlier than an earliest time stamp in association with a presently stored frame. 

25 4. A method according to claim 1 comprising the further steps of: 

e) determining a tendency in a range of time intervals between the occurrences of packet payload storage in 
step a): and 

i) ,n a case where the range tends toward a reduction, the delay in delivery is correspondingly reduced by 
momentarily increasing said regular rate, and 

li) in a case where the range tends toward an increase, delivery is correspondingly increased by momentarily 
reducing said regular rate. 
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5 A method according to any one of claims 1 to 4. wherein the regular rale is increased by extracting two of sa^d 
units in a lime in which only one unit is normally extracted at said regular rate, and deliverir^g only one ont^ej^o 
extracted units for use in audible generation of the speech signal, before extracting and delivering the next unit. 

6 A method according to any one of claims 1 to 4. wherein the regular rate is retarded by not extracting the next of 
sard?n°s fn a time'at which one unit normally extracted a, said regular rale and delivering a -bstiU.^ um and 
the previously retrieved unit for producing a speech signal before retrieving and using said next of said units. 

7 The method according to claim 1 . wherein, while there is at least one stored unit, step a) is limited to storing a 
received packet payload being associated with a lime stamp which is of a later time than a time stamp associated 
with said at least one unit, and otherwise discarding the received packet payload: and 

wherein the target number is related to differences between times of packet payload storage events 

8 A control means for controlling a flow of units of voice sample data to a processing means being operative in 
response to clocking pulses for audibly reproducing speech signals from packetized units of the voice data samples 
hav'ng been transmitted from a transmission source via a packet communications facility, the control means com- 



prising: 



buffer means for receiving packets of the units of voice data samples, and one after another storing the units: 

gating means for extracting the stored units one after an other at a regular rate, dependent upon said clocking 
pulses, and delivering a unit to the processing means whereby speech signals are audibly reproduced: 
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target delay. ^^^'^9^ ^^'^^ d^l'^ery of the stored units is altered toward a 

5 9. A control means according to claim 8 further comprising 
means for determining said difference. 

1 0. A control means according to claim 8 further comprising. 

;r J:e?r:s'rd""^"^^ ^ ^^-^^ °' "-^^ '-'--'^ occurrences of pacKet reception 

means for adjusting said target number said means being responsive to 

i) the range tending toward a reduction by reducing the target number and 

-i) the range tending toward an increase by increasing the target number 

with said at least one unit ^ ^^"^ " °^ ^ '^'^''^"^^ '^^^ '^e time of a time stamp associated 

12. A control means according to claim 8 further comprising: 
a substitute register for storing a substitute unit : and 

the gating means to deliver a copy of ^he subsmute un f T ' ^^'^^ '^^9^'^ 

delay toward the delay target. ^^^stitute unit from the substitute register, to increase the average 

1-3. A control means according to claim 8 further comprising: 

a substitute buffer having a substitute unit permanently stored therein: and 

.ega„.ode,,verasubs.,tuLni.f=^ 

14. A letephone system (o, serving a plural,,, ol lelephones, comprising: 

a buller ,„r storing tramea ol voicedata trom packets transported via an IP network: 

:rsr ^'^ ~ a standard operating 

a Circuit switch tor coupling a one o, the plurality ot telephones to receive the voice signals: 
a subsiltute register lor storing at least one substitute Irame ol voice data: and 

a gate means responsive ,„ a demand trom the Irame processor lor pertorming any one o, the tollowing steps 

iia—r r r h;rdrtr r :o3rer.r °- °' '° - 

:l:cZTLT °' '^^ ^"«- -=«™'in5 said one Irame to the Irame 
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iii) delivering a copy o. the substitute frame of voice data in the substitute register to the frame processor 
and thereby increase the frames stored in the buffer: 
Whereby in operation a t,me interval, between storing a transported frame in the buffer and coupling the cor- 
responding voice signals to the telephone, is controlled by the telephone system. 

method comprising the steps of. 

a) providing said buffer with a predetermined amount of storage space for storing data: 

b) storing the data in the buffer as the data is received; 

c) extracting a unit of the stored data in response to each requirement from the system for utilization of data 
and delivering the unit of extracted data to the system 

d) ,n step c). from time to time in response to a tendency toward the overflow condition, extracting another 
unit of the stored data and discarding the extracted data: and 

e) in step c) from time to time in response to a tendency toward the underflow condition, not extracting a unit 
of the stored data and delivering a substitute unit of data to the system. 

Wherein anoccurrenceofanunderfiowconditionoranoccurrenceofanoverflowconditionissubstantiallymitigated 

25 during said period of time. 
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